Unsupervised Disambiguation of Image Captions

نویسندگان

  • Wesley May
  • Sanja Fidler
  • Afsaneh Fazly
  • Sven J. Dickinson
  • Suzanne Stevenson
چکیده

Given a set of images with related captions, our goal is to show how visual features can improve the accuracy of unsupervised word sense disambiguation when the textual context is very small, as this sort of data is common in news and social media. We extend previous work in unsupervised text-only disambiguation with methods that integrate text and images. We construct a corpus by using Amazon Mechanical Turk to caption sensetagged images gathered from ImageNet. Using a Yarowsky-inspired algorithm, we show that gains can be made over text-only disambiguation, as well as multimodal approaches such as Latent Dirichlet Allocation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images, which is a much more abundant commodity. We here propose a novel way of using such textual data by artificially generating missing visual information. We eva...

متن کامل

Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings

We introduce a new task, visual sense disambiguation for verbs: given an image and a verb, assign the correct sense of the verb, i.e., the one that describes the action depicted in the image. Just as textual word sense disambiguation is useful for a wide range of NLP tasks, visual sense disambiguation can be useful for multimodal tasks such as image retrieval, image description, and text illust...

متن کامل

Variational Autoencoder for Deep Learning of Images, Labels and Captions

A novel variational autoencoder is developed to model images, as well as associated labels or captions. The Deep Generative Deconvolutional Network (DGDN) is used as a decoder of the latent image features, and a deep Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used to approximate a distribution for the latent DGDN features/code. The latent code is also linked to g...

متن کامل

Unsupervised Texture Image Segmentation Using MRFEM Framework

Texture image analysis is one of the most important working realms of image processing in medical sciences and industry. Up to present, different approaches have been proposed for segmentation of texture images. In this paper, we offered unsupervised texture image segmentation based on Markov Random Field (MRF) model. First, we used Gabor filter with different parameters’ (frequency, orientatio...

متن کامل

News Image Annotation on a Large Parallel Text-image Corpus

In this paper, we present a multimodal parallel text-image corpus, and propose an image annotation method that exploits the textual information associated with images. Our corpus contains news articles composed of a text, images and image captions, and is significantly larger than the other news corpora proposed in image annotation papers (27,041 articles and 42,568 captionned images). In our e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012